Bilingual Connections for Trilingual Corpora: An XML Approach

نویسندگان

  • Victoria Arranz
  • Núria Castell
  • Josep Maria Crego
  • Jesús Giménez
  • Adrià de Gispert
  • Patrik Lambert
چکیده

This paper describes the design and development of a trilingual spontaneous speech corpus for statistical speech-to-speech translation. The languages considered are Catalan, Spanish and US-English. This corpus has been built bearing in mind the strong need for multilingual collections of on-line data within the area of statistical translation, as well as the need for data that are parallel or aligned, that contain different types of linguistic information and that can be used by different translation systems. For that reason, our aim has been the creation of a linguistically-enriched resource with an XML-based DTD that allows a useful, transparent and flexible storage of the data. Moreover, these resources are also valuable for a wide range of Natural Language Processing applications, such as multilingual resource acquisition or word sense discrimination, among others.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Looking for Transliterations in a Trilingual English, French and Japanese Specialised Comparable Corpus

Transliterations and cognates have been shown to be useful in the case of bilingual extraction from parallel corpora. Observation of transliterations in a trilingual English, French and Japanese specialised comparable corpus reveals evidences that they are likely to be used with comparable corpora too, since they are an important and relevant part of the common vocabulary, but they also yield l...

متن کامل

Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian languages

This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing six different languages. In order to compare how well different types of bilingual dictionaries covered the most common ...

متن کامل

Acquisition of Medical Terminology for Ukrainian from Parallel Corpora and Wikipedia

The increasing availability of parallel bilingual corpora and of automatic methods and tools for their processing makes it possible to build linguistic and terminological resources for low-resourced languages. We propose to exploit various corpora available in several languages in order to build bilingual and trilingual terminologies. Typically, terminology information extracted in French and E...

متن کامل

Towards a Description of Trilingual Competence

Most studies involving trilingualism have been carried out within the theoretical framework of bilingualism research. No attempt has been made to delimit trilingualism as a concept in its own right, and often it has been assumed to be an extension of bilingualism. In young children, trilingual language acquisition largely follows the path of bilingual acquisition. With regard to language behavi...

متن کامل

Speech perception in noise by monolingual, bilingual and trilingual listeners.

BACKGROUND There is strong evidence that bilinguals have a deficit in speech perception for their second language compared with monolingual speakers under unfavourable listening conditions (e.g., noise or reverberation), despite performing similarly to monolingual speakers under quiet conditions. This deficit persists for speakers highly proficient in their second language and is greater in tho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004